--- layout: page title: Script 8 - Interactions & Non-Linearity permalink: /scripts/script8/ parent: R Scripts nav_order: 8 --- Script 8: Interaction Effects & Non-Linearity

Run the code in this entire script in your own local R script. Always annotate your code including the output or result of the code you run unless it is trivial or already discussed below.



Main Commands Introduced in this Session

lm() # linear regressions
predict() # predict values based on (linear) models
seq() # create a sequence of values
plot() # plot figures
lines() # add lines to plots 
segments() # add line segments to a plot

Do Voters Hold Politicians Accountable for Economic Growth?

So far in this course, we have used regression to study the isolated, linear effect of independent variables on an outcome of interest. Now, we are going to explore techniques to include non-independent effects (i.e. how the effect of one independent variable is moderated by another variable) as well as non-linear terms in a regression.

Substantively, we will consider the electoral consequences of global market integration. An existing literature seeks to establish that the transnational flow of goods, services and capital has important consequences for domestic policy outcomes. Specifically, critics often argue that economic openness constrains the autonomy of governments and forces policy convergence. At the same time, an existing literature on retrospective voting argues that voters hold elected officials accountable for economic growth. When voters decide on whom to support, the state of the macro-economy (rates of GDP growth, unemployment, inflation, etc.) is often considered a strong determinant of electoral outcomes.

Yet, if national economies are increasingly integrated into the global economy with the ability of governments to set economic policy being constrained, governments may not actually be responsible for macroeconomic outcomes - as those are increasingly determined by global trends. In closed economies, it is difficult for politicians to escape blame for poor economic performances, but in open economies, politicians may be able to shift blame to forces beyond their control - the global economy.

The question then is whether voters believe them.

In this context, Timothy Hellwig and David Samuels (2007) ask: What effect does the opening of the world’s economies have on voters’ perceptions of politicians’ competence as economic managers, and in turn, on their propensity to hold the politicians accountable for economic performance? Hellwig and Samuels develop a novel hypothesis about how the constraints of economic integration affect government accountability: They posit that greater exposure to the world economy reduces electoral accountability in the world’s democracies.

Note that this argument entails two components.

  1. Voters punish elected officials for poor economic growth.

  2. Voters acknowledge the constraints of interdependence and adjust their expectations about elected officials’ responsibility for macroeconomic outcomes.

If Hellwig and Samuels’ hypothesis holds, we should expect to see:

  1. Support for incumbent politicians will be higher in elections following strong economic growth, and

  2. The effect of economic growth on support for incumbents will be moderated by a country’s level of integration into the global economy. Incumbents overseeing poor growth will be punished more in relatively closed economies than poor growth incumbents in relatively open economies.

For more details, see: Timothy Hellwig and David Samuels (2007), “Voting in Open Economies: The Electoral Consequences of Globalization.” Comparative Political Studies 40 (3): 283-306.

We will evaluate Hellwig and Samuels’ hypothesis using Levin’s dataset on the determinants of incumbent vote share in elections, which we already used before. Let’s investigate these components of the theory in turn.

To do so, we load the data set and familiarise ourselves with the key variables.


setwd("~")
levin <- read.csv("levin.csv")

summary(levin$delta_vote)

hist(levin$delta_vote)

table(levin$binary_growth, levin$binary_openness) # growth in rows, openness in columns

Variable Name Variable Description
delta_vote Change in incumbent’s vote share since previous election
int_support Great power intervention supporting incumbent (0,1)
int_oppose Great power intervention opposing incumbent (0,1)
eff_num_parties Effective number of parties
iso 3-letter country code
year Year
gdp_percapita GDP per capita
gdp_growth Continuous measure of economic growth
binary_growth GDP per capita growth > 2%? (0,1)
trade_openness Trade volume in % of GDP
binary_openness Trade > 35% of GDP? (0,1)

Our outcome of interest, this time, is delta_vote, representing the change in incumbents’ vote share since the previous election. To evaluate the hypothesis, we need to model the effect of GDP growth and of trade openness on the change in vote share.

We will begin by estimating the linear effect of economic growth (binary_growth) on support for incumbents (delta_vote). If incumbents receive higher vote shares when countries have large economic growth, then the coefficient on binary_growth should be positive and meaningful. If the coefficient is statistically significant, we can conclude that the population effect will be significantly different from zero. Specifically, we are estimating the following equation:

\[\Delta Incumbent Vote Share = \alpha + \beta_1 EconomicGrowth + \beta_2 Economic Openness + \sum_{i=1}^{k}\beta_i Controls_i + \epsilon\]

# ------ Linear additive model

m1 <- lm(delta_vote ~ binary_growth + binary_openness +
                      as.factor(int_support) + as.factor(int_oppose) + 
                      eff_num_parties + gdp_percapita, data=levin)

summary(m1)

library(stargazer)
stargazer(m1,type="text") # Create a table for the regression output

Making Predictions: Comparative Statics

Our estimated coefficient for binary_growth is positive and statistically significant. Incumbents in countries experiencing large economic growth experience a 3.83%-point increase in their vote share compared to incumbents in countries experiencing low economic growth.

But how does the effect of GDP growth vary depending on whether the country is economically open to trade or not? A first way of examining this is to compare the predicted values of delta_vote at different levels of binary_growth for open vs. closed economies.

• We can obtain predicted values of delta_vote for all values of the binary_growth variable using the predict() command.

• We create a matrix with a range of predicted values of incumbent vote share as the variable of interest - binary_growth in this case - changes, holding the values of the other variables constant.

This way, we obtain the predicted values of delta_vote for each possible value of economic growth (in this case, 0 and 1), holding other variables at a chosen value (0 in the case of binary variables, the mean in the case of continuous variables).

  1. In particular, we create two matrices: in m1_fit_closed, we predict the values of delta_vote at different values of binary_growth for closed economies (when binary_openness is set to 0).

  2. In m1_fit_open, binary_openness is set to 1.

In short, we are predicting levels of delta_vote and their uncertainties across the range of economic growth (binary_growth) and economic openness (binary_openness) values. Note that the interval="confidence" option will add confidence intervals to the predicted values.

# ---- Predicted values: does the effect vary by economic openness?

## Fitted values in non-trading countries
m1_fit_closed <- predict(m1, interval="confidence",
                         newdata = data.frame(binary_growth = c(0,1),
                         binary_openness =0,
                         int_support = 0,
                         int_oppose = 0,
                         eff_num_parties=
                         mean(levin$eff_num_parties,na.rm=T),
                         gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

## Fitted values in trading countries
m1_fit_open <- predict(m1, interval="confidence",
                       newdata = data.frame(binary_growth = c(0,1),
                       binary_openness = 1,
                       int_support = 0,
                       int_oppose = 0,
                       eff_num_parties=mean(
                       levin$eff_num_parties, na.rm=T),
                       gdp_percapita= mean(levin$gdp_percapita, na.rm=T)))

#compare the predicted valued of delta_vote for low and high economic growth
m1_fit_closed #in closed economies
m1_fit_open # in open economies

The 95% confidence intervals for the predicted values of delta_vote in closed and open economies with high or low economic growth overlap. This suggests that there is no statistically significant difference (at \(\alpha\) = 0.05) in the effect of economic growth on incumbent support in open versus closed economies.

Note: Strictly speaking, when confidence intervals overlap the difference between them could still be statistically significant. To make a confident claim about significance, we would need to estimate the confidence intervals around the difference between the estimates. However, in this case the confidence intervals are overlapping to such a large extent that it would be extremely unlikely to find a statistically significant - and substantively meaningful - difference.

To better understand this relationship, we can also visualize it by plotting the predicted values. What you will notice is that the slope of the two effects is exactly the same (i.e. the lines connecting the values of \(\hat{y}\) for open and closed economies are parallel). Recall that OLS aims at allowing us to disentangle effects of individuals predictors. Hence, this is because - by design - we estimate independent effects for either independent variable. It might not always be appropriate to assume that the lines should be parallel - indeed, sometimes we would expect them not to be. This is why we often use interaction terms.

# ---- Plotting predicted values

# Create empty plot
plot(1:2, m1_fit_closed[,"fit"], type="n",
          xlim=c(-0.1,1.25), #limits of the x-axis
          ylim=c(-10,2), #limits of y-axis
          xlab="Economic growth", #label for x-axis
          ylab="Change in incumbent vote share", #label for y-axis
          axes=FALSE,
          main="Effect of economic growth on incumbent support \n by economic openness")
          axis(side=1, at=c(0.05,1.05), # create a new x-axis, which starts slightly above zero
          labels = c("No growth", "Economic Growth"))
          axis(side=2)

# Now fill the empty plot
# Add predictions for closed economies
points(0:1,m1_fit_closed[,"fit"],
                 col="red", pch=4) # Add fitted values
           segments(0, m1_fit_closed[1,"fit"],
                 1, m1_fit_closed[2,"fit"], lty="dashed", col = "red") #link
           segments(0:1, m1_fit_closed[,"lwr"],
                 0:1, m1_fit_closed[,"upr"], col="red") # conf. int.
           text(1, m1_fit_closed[2,"fit"]+2,
                 "Closed economies", col="red") # text

# Add predictions for open economies
points(0.1:1.1,m1_fit_open[,"fit"], col="blue") #fitted values
segments(0.1, m1_fit_open[1,"fit"],
         1.1, m1_fit_open[2,"fit"], lty="dashed", col = "blue") #link
segments(0.1:1.1, m1_fit_open[,"lwr"],
         0.1:1.1, m1_fit_open[,"upr"], col="blue") # conf. int.
text(1.05, m1_fit_open[2,"fit"]-2, "Open economies", col="blue") #text

Interaction Effects

Is the additive OLS model such as the one specified above a fair representation of the data generating process?

Recall that each beta coefficient represents the constant effect of one independent variable, i.e. the linear unit-change in that variable holding other predictors constant. How can we test Hellwig and Samuels’s claim that the effect of economic growth on support for incumbents is conditional on levels of economic integration? Economic growth may have direct effects on support for incumbents, but the effects might also vary depending on (i.e., be moderated by) political and social institutions that condition expectations. In countries with high levels of trade as a share of GDP, economic growth may be determined more strongly by global macroeconomic trends than by domestic economic policy—the component of economic growth that incumbents have the most control over. Therefore, the effect of economic growth shocks might be different in countries that are relatively open to international trade compared to countries that are relatively closed.

This suggests there may be heterogeneous effects of economic growth on support for incumbents based on underlying levels of economic openness. How could we model this relationship? The argument is that the effect of economic growth on support for incumbents depends on a country’s underlying level of economic openness. In other words, economic openness moderates the effect of economic growth on support for incumbents. Note that a moderating effect is not the same as a control variable: moderators don’t are not just correlated with both \(X\) and \(Y\), but simply allow the effect of \(X\) on \(Y\) to vary over different values of \(X\).

We can examine this kind of conditional relationship adding multiplicative interaction terms to the OLS model. This allows the effect of economic growth on support for incumbents to be different in relatively open and relatively closed countries. A more appropriate representation of the data generating process may therefore be expressed by the following model:

\[\Delta Incumbent Vote Share = \alpha + \beta_1 EconomicGrowth + \beta_2 Economic Openness + \beta_3 EconomicGrowth * Economic Openness + \sum_{i=1}^{k}\beta_i Controls_i + \epsilon \]

Note that when we include multiplicative effects, we also need to add separate terms for both of the interacting variables. Interaction effects (sometimes also called moderating effects or conditional effects) allow us to determine whether the effect of one independent variable (binary_growth) on an outcome variable (delta_vote) is conditional on the level of another variable (binary_openness). We refer to two kinds of effects in interaction models as main effects and interaction effects:

Main effect: The Beta coefficient of a particular variable (represented by \(\beta_1\) and \(\beta_2\) in equation 1) represent the relationship between that variable and the outcome delta_vote.

Interaction effect: \(\beta_3\) is the coefficient of the “interaction term”. It indicates that the effects of binary_growth and binary_openness are now dependent on the values of the other variable.

The interpretation of interaction effects is less straightforward than in the additive regression case. In particular, the Beta coefficient of the main effect must not be interpreted as the average effect of a change in \(X\) on \(Y\) as we did in a linear-additive model. Because the effect of economic growth on incumbent vote share is now dependent on the values of trade openness, the magnitude of the effect is given by

\[\beta_1 + \beta_3 * EconomicOpenness\]

That is, we cannot know the effect of binary_growth by just looking at \(\beta_1\). The coefficient etimate of \(\beta_1\)now is merely the effect in a special case: it represents the relationship between economic growth and delta_vote when binary_openness is zero.

When it comes to the effect of economic openness on delta_vote, this is equal to

\[\beta_2 + \beta_3 * EconomicGrowth\]

Likewise, the coefficient \(\beta_2\) by itself only captures the effect of economic openness when binary_growth is zero, i.e., when there is no growth - whether that makes sense or not.

Note that the technical model is agnostic to the role of independent or moderating variables. Usually, we have good reason to assume that the conditional effect works in one direction, not the other one.

Categorical on Categorical Interactions

We now estimate multiplicative interaction models using R. Let’s start with a somewhat simple case. We first examine whether the effect of economic growth on support for incumbents varies by economic openness using binary indicators of each regressor.

The table below shows us how we can predict values of the dependent variable (\(\hat{y}\)) when an interaction term between binary variables (\(X_1\) and \(X_2\)) is added to the model.

\[ \begin{array}{cc|l} X_1 & X_2 & \hat{y} \\ \hline 0 & 0 & \alpha \\ 1 & 0 & \alpha+\beta_1 \\ 0 & 1 & \alpha+\beta_2 \\ 1 & 1 & \alpha+\beta_1+\beta_2+\beta_3 \\ \hline \end{array} \]

We fit the model in R by simply adding the two variables separately and the interaction term by multiplying the two variables:

# ----- Multiplicative interaction model
m2 <- lm(delta_vote ~ binary_growth + binary_openness +
           binary_growth*binary_openness +
           as.factor(int_support) + as.factor(int_oppose) +
           eff_num_parties + gdp_percapita,
           data=levin)

summary(m2)
# GDP growth is associated with ... ?

The coefficient on the interaction term in m2 (-4.2) is significant at the 95% confidence level, indicating that the effect of economic growth on the change in the incumbent vote share is reduced when the country is open to trade. The main effect of binary_growth is positive and significant, whereas the main effect of binary_openness is not statistically different from zero, indicating that economic openness does not have an effect on delta_vote when there is no growth.

To make predictions using this model, we also need to keep control variables constant. We will repeat the steps from 1.1. with this new model: using the predict() function, we predict the values of delta_vote for open and closed economies depending on whether the country’s economy is growing or not. We specify the values of the variables of interest and the control variables (usually zero for dummies and mean values for continuous ones). We thus obtain fitted values from m2, and plot them.

# predicted values
## Fitted values in non-trading countries
m2_fit_closed <- predict(m2, interval="confidence",
                        newdata = data.frame(binary_growth=c(0,1),
                                binary_openness=0,
                                int_support=0,
                                int_oppose=0,
                                eff_num_parties=mean(
                                levin$eff_num_parties, na.rm=T),
                                gdp_percapita=mean( levin$gdp_percapita, na.rm=T)))

## Fitted values in trading countries
m2_fit_open <- predict(m2, interval="confidence",
                       newdata = data.frame(binary_growth=c(0,1),
                       binary_openness=1,
                       int_support=0,
                       int_oppose=0,
                       eff_num_parties=mean(
                       levin$eff_num_parties, na.rm=T),
                       gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

# Heterogeneous effect of economic growth by trade openness
m2_fit_closed
m2_fit_open

# ---- Now plot them # Create empty plot
plot(1:2, m2_fit_closed[,"fit"], type="n", 
     xlim=c(-0.1,1.25),        
     ylim=c(-10,2), 
     xlab="Economic growth", 
     ylab="Change in incumbent vote share", 
     axes=FALSE,
     main="Heterogeneous effects of economic growth on incument support \n by economic openness")
     axis(side=1, at=c(0.05,1.05) , 
     labels = c("No growth", "Economic Growth"))
     axis(side=2)

# Add predictions
segments(0:1, m2_fit_closed[,"lwr"], 
         0:1, m2_fit_closed[,"upr"], col="red")
points(0:1,m2_fit_closed[,"fit"], pch=4, col="red")

segments(0.1:1.1, m2_fit_open[,"lwr"], 
         0.1:1.1, m2_fit_open[,"upr"], col="blue")
points(0.1:1.1,m2_fit_open[,"fit"], col="blue")

segments(0.1, m2_fit_open[1,"fit"], 1.1, m2_fit_open[2,"fit"], col="blue")
segments(0, m2_fit_closed[1,"fit"], 1, m2_fit_closed[2,"fit"], col="red")

text(1.1, m2_fit_open[2,"fit"]-2, "Open economies", cex=.85, col="blue")
text(1, m2_fit_closed[2,"fit"]+2, "Closed economies", cex=.85, col="red")

What do we notice by comparing this plot to the first plot we made (without interaction terms)? In this plot, the lines for open and closed economies are not parallel, indicating the heterogeneous effects of economic growth at various level of openness: in closed economies, economic growth significantly increases the incumbent vote share, as voters reward (or punish) leaders who have control over the country’s economy; in open economies, the effect of economic growth on vote share is mitigated.

The Common Support Assumption

Interaction terms require common support — that is, there are variations across the different possible values of the key independent variable (binary_growth) and the moderator (binary_openness). If there is no variation in either variable, the interaction becomes meaningless. We can check for this by having a look at a cross tab.

# -- Common support? Are there observations in all the bins?
table(levin$binary_openness, levin$binary_growth)

As the cross-tabulation shows, the common support assumption holds.


Categorical on Continuous Interactions

In the previous model (m2), we examined two binary indicators and their interaction. However, economic growth can also be measured continuously. We can consider the conditional effect of economic growth on support for incumbents by level of trade openness using a different interaction term. We specify the same type of equation as above, only now we include a continuous measure of economic growth (gdp_growth) to be moderated by our binary indicator of trade openness (binary_openness).

# ---- Multiplicative interaction model
m3 <- lm(delta_vote ~ gdp_growth + binary_openness +        
         gdp_growth*binary_openness +
         as.factor(int_support) + as.factor(int_oppose) +
         eff_num_parties + gdp_percapita,
        data=levin)

summary(m3)

How do we interpret the coefficients? The coefficient on the interaction term is significant at \(\alpha =0.05\) and negative, indicating that the interaction of GDP growth and openness negatively affects the expected incumbent vote share. The coefficient for openness is insignificant, but this is not a cause of concern: that coefficient is only tells use the effect when GDP growth is exactly 0, as for all other cases we need to consider the interaction term.

Just like in the previous section, we should calculate substantively meaningful predicted values (“quantities of interest”) and their respective confidence intervals. The table below shows how we can predict values of the dependent variable (\(\hat{y}\)) when a continuous (\(X_1\)) and a binary variable (\(X_2\)) are interacted.

\[ \begin{array}{cc|l} X_1 & X_2 & \hat{y} \\ \hline 0 & 0 & \alpha \\ \text { Any value } \neq 0 & 0 & \alpha+\beta_1 X_1 \\ 0 & 1 & \alpha+\beta_2 \\ \text { Any value } \neq 0 & 1 & \alpha+\left(\beta_1+\beta_3\right) X_1+\left(\beta_2\right) \\ \hline \end{array} \] By including relevant covariates in the predict() function, we can obtain predicted values across the range of economic growth (gdp_growth) for open and closed countries. When predicting the values of delta_vote for open and closed economies along the range of values of economic growth measured as a continuous variable, we need to keep the other covariates constant at a chosen value. Recall that we set binary variables (int_support and int_oppose) to 0, and numeric variables (eff_num_parties and gdp_percapita) to their respective mean.

# ---- Predicted values

## Fitted values in non-trading countries
m3_fit_closed <- predict(m3, interval="confidence",
                         newdata = data.frame(gdp_growth=seq(-20,20,.5),
                         binary_openness=0,
                         int_support=0,
                         int_oppose=0,
                         eff_num_parties= mean(levin$eff_num_parties, na.rm=T),
                         gdp_percapita= mean(levin$gdp_percapita, na.rm=T)))

## Fitted values in trading countries
m3_fit_open <- predict(m3, interval="confidence",
               newdata = data.frame(gdp_growth=seq(-20,20,.5),
               binary_openness=1,
               int_support=0,
               int_oppose=0,
               eff_num_parties= mean(levin$eff_num_parties, na.rm=T),
               gdp_percapita= mean(levin$gdp_percapita, na.rm=T)))

# Heterogeneous effect of economic growth by trade openness
cbind(seq(-20,20,.5), m3_fit_closed, m3_fit_open)

The last command shows the values of \(\hat{y}\) for every half-percentage point increase in GDP growth, distinuishing between open and closed economies. We can also plot the results with their confidence intervals.

# ---- Plot
plot(seq(-20,20,.5), m3_fit_closed[,"fit"], type = "n",
     ylim=c(-30,15),
     xlab="Level of economic growth",
     ylab="Change in incumbent vote share",
    main="Conditional effects of economic growth on incumbent vote share by levels of trade openness")

lines(seq(-20,20,.5), m3_fit_closed[,"fit"],
      type="l", col = "blue", lwd=2)
lines(seq(-20,20,.5), m3_fit_closed[,"lwr"],
      type="l", lty = "dashed", col = "blue")
lines(seq(-20,20,.5), m3_fit_closed[,"upr"],
      type="l", lty = "dashed", col = "blue")
lines(seq(-20,20,.5), m3_fit_open[,"fit"],
      type="l", col = "red", lwd=2)
lines(seq(-20,20,.5), m3_fit_open[,"lwr"],
      type="l", lty = "dashed", col = "red")
lines(seq(-20,20,.5), m3_fit_open[,"upr"],
      type="l", lty = "dashed", col = "red")

text(10, -20, "Open countries", adj=1, col = "red")
text(10, 10, "Closed countries", adj=1, col = "blue")

Just like in the binary interaction case, we see that the level of GDP growth significantly impacts the incumbent vote share in closed economies, whereas the effect in countries that are open to trade is significantly weaker. Once again, the interaction term allows for heterogeneous effects, represented by non-parallel slopes.


Continuous on Continuous Interactions

Since economic openness can also be measured continuously, we re-estimate the conditional relationship between economic growth and support for incumbent officials using a continuous measure of economic openness: trade as % of GDP (trade_openness). We specify the same type of equation as above, only now we allow a continuous measure of economic growth (gdp_growth) to be moderated by our continuous indicator of economic openness (trade_openness).

\[ \begin{array}{cc|l} X_1 & X_2 & \hat{y} \\ \hline 0 & 0 & \alpha \\ \text { Any value } \neq 0 & 0 & \alpha+\beta_1 X_1 \\ 0 & \text { Any value } \neq 0 & \alpha+\beta_2 X_2 \\ \text { Any value } \neq 0 & \text { Any value } \neq 0 & \alpha+ \beta_1 X_1+\beta_2 X_2+\beta_3 X_1 X_2 \\ \hline \end{array} \] The logic is very much the same as before - see table above. Note though that the case of a binary moderator is more intuitive as we now have two variables that vary. Recall that the model is agnostic as to the role of the effect and moderator. Hence, in the final case (where neither variable is at zero) the interaction term might be written either way, and the way you choose should depend on theory. We might be interested in the effect of \(X_1\) - then it would be \((\beta_1 + \beta_3 X_2)X_1\) - or in the effect of \(X_2\) moderated by \(X_1\), which would be \((\beta_2 + \beta_3 X_1)X_2\). Note that both are mathematically equivalent.

We plot the relationship in exactly the same way as before but this time predict values for a few values of our continuous measure of trade openness (we pick the 10th and 90th percentile, the quartiles and the median for now).

# Inspect new variable
summary(levin$trade_openness)
hist(levin$trade_openness) # Skewed...

# ------ Interaction model with continous X_1 and continuous X_2
m4 <- lm(delta_vote ~ gdp_growth + trade_openness +            
           gdp_growth*trade_openness +
           as.factor(int_support) + as.factor(int_oppose) +
           eff_num_parties + gdp_percapita,
           data=levin)
summary(m4)

# ---- Predicted values
## Fitted values in across range of trading
m4_fit_10 <- predict(m4, interval="confidence",
                  newdata = data.frame(gdp_growth=seq(-20,20,.5),
                  trade_openness=quantile(levin$trade_openness, .10, na.rm=T),
                  int_support=0,
                  int_oppose=0,
                  eff_num_parties=mean(levin$eff_num_parties, na.rm=T),
                  gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

m4_fit_25 <- predict(m4, interval="confidence",
                 newdata = data.frame(gdp_growth=seq(-20,20,.5),
                 trade_openness=quantile(levin$trade_openness, .25, na.rm=T),
                 int_support=0,
                 int_oppose=0,
                 eff_num_parties=mean(levin$eff_num_parties, na.rm=T),
                 gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

m4_fit_50 <- predict(m4, interval="confidence",
                 newdata = data.frame(gdp_growth=seq(-20,20,.5),
                 trade_openness=quantile(levin$trade_openness, .50, na.rm=T),
                 int_support=0,
                 int_oppose=0,
                 eff_num_parties=mean(levin$eff_num_parties, na.rm=T),
                 gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

m4_fit_75 <- predict(m4, interval="confidence",
                 newdata = data.frame(gdp_growth=seq(-20,20,.5),
                 trade_openness=quantile(levin$trade_openness, .75, na.rm=T),
                 int_support=0,
                 int_oppose=0,
                 eff_num_parties=mean(levin$eff_num_parties, na.rm=T),
                 gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

m4_fit_90 <- predict(m4, interval="confidence",
                 newdata = data.frame(gdp_growth=seq(-20,20,.5),
                 trade_openness=quantile(levin$trade_openness, .90, na.rm=T),
                 int_support=0,
                 int_oppose=0,
                 eff_num_parties=mean(levin$eff_num_parties, na.rm=T),
                 gdp_percapita=mean(levin$gdp_percapita, na.rm=T)))

# Heterogeneous effect of economic growth by trade openness
plot(seq(-20,20,.5), m4_fit_10[,"fit"], type = "n",
     ylim=c(-20,10), xlim=c(-21,21), axes=F,
     xlab="Level of economic growth",
     ylab="Change in incumbent vote share",
     main="Conditional effects of economic growth on incumbent vote share \n by levels of trade openness")
axis(1); axis(2)

lines(seq(-20,20,.5), m4_fit_10[,"fit"], type="l", col = "red", lwd=2)
lines(seq(-20,20,.5), m4_fit_25[,"fit"], type="l", col = "orange", lwd=2)
lines(seq(-20,20,.5), m4_fit_50[,"fit"], type="l", col = "yellow", lwd=2)
lines(seq(-20,20,.5), m4_fit_75[,"fit"], type="l", col = "blue", lwd=2)
lines(seq(-20,20,.5), m4_fit_90[,"fit"], type="l", col = "violet", lwd=2)

text(21, 10, "Closed countries", adj=1, col = "red")
text(21, -10, "Open countries", adj=1, col = "violet")

Non-Linearity

Whenever we model a relationship with OLS, we assume a linear relationship between the parameters (this is actually the first OLS assumption). This implies that if \(X\) and \(Y\) are positively related at low values of \(X\), they are positively related to the same extent at high values of \(Y\). In reality, this often is not the case. Think, for instance, of the relationship between age and savings: you start your life without savings, but your savings typically increase until retirement age, and decline thereafter. A linear relationship between savings and age does not allow you to accurately capture this relationship.

We can apply a similar reasoning to our analysis. The coefficient on the eff_num_parties in m4 is negative, indicating that as a larger number of parties compete for seats, the incumbent’s electoral support decreases, holding other covariates constant. However, we may think that electoral volatility may not be a linear function of the number of parties competing in elections. Stable two-party systems may have high volatility from election to election, regimes with 3-5 parties may have less volatility, but regimes with a large number of parties may return to high volatility. In this case, we may think that the effect of the effective number of parties on support for incumbents may be non-linear.

How do we account for non-linearity in regression? We can model this type of relationship by adding a polynomial term to the regression equation. Specifically, we add a polynomial to the variable(s) where we think a non-linear relationship exists, with as many terms as the power of the polynomial which we think best expresses the relationship: two for a quadratic relationship (linear and squared), three for a cubic relationship (linear, squared, and cubic terms), and so on. The polynomial is flexible, in the sense that the model doesn’t have priors as to whether the estimated relationship is convex or concave—whether the relationship is U-shaped or inverted U-shaped.

In this case, we adjust the OLS model from before to incorporate the quadratic effect of eff_num_parties on support for incumbents in the following specification:

\[\Delta Incumbent Vote Share = \alpha + \beta_1 EconomicGrowth + \beta_2 Economic Openness + \beta_4 EffectiveNumberParties + \beta_5 EffectiveNumberParties^2 + \sum_{i=1}^{k}\beta_i Controls_i + \epsilon \]

Note: Here we are keeping the original specification (m1) without the interaction term to keep it simple.

R will estimate a coefficient for the linear (“lower order”) and squared (“higher order”) eff_num_parties terms. The resulting regression relationship will no longer be linear but curvilinear, and in particular, it will be either U-shaped (convex) or inverted U-shaped (concave). The sign of the coefficient for the quadratic term (in this case, \(\beta_5\)) indicates whether the relationship is convex or concave (see the figure below for a visual representation):

• A U-shaped/convex shape (represented by a positive coefficient for the quadratic term) indicates that \(X\) and \(Y\) are negatively associated until a certain point, then positively associated (e.g. the relationship between months of the year and rainfall in Germany).

• An inverted U-shaped/concave shape (represented by a negative coefficient for the quadratic term) means that \(X\) and \(Y\) are positively associated until a certain point, then negatively associated (think of the example of age and savings).

Estimating a regression with a polynomial term is pretty straightforward in R. Consider the following code to estimate a regression model with a quadratic term - note that we are using I() to isolate the numeric value of the quadratic term.

# Evaluate the distribution of eff_num_parties
hist(levin$eff_num_parties, breaks=seq(0,4,.1))
summary(levin$eff_num_parties)

# --- Non-linear model
m5 <- lm(delta_vote ~ binary_growth + binary_openness +
      as.factor(int_support) + as.factor(int_oppose) +
      eff_num_parties + I(eff_num_parties^2) + gdp_percapita,
      data=levin)

summary(m5)

While the coefficient for the linear eff_num_parties term remains negative, its quadratic term is positive and statistically significant, although only at the 90% confidence level. This suggests that political systems with very few and large number of parties are more volatile, thus having a stronger effect on delta_vote than regimes with an intermediate number of parties. The positive coefficient also means that the relationship between the quadratic term and the outcome variable will be U-shaped or convex. We can visualize this by using the predict() function, calculating the fitted values (\(\hat{y}\)) at different values of eff_num_parties.

# Obtain predicted values and plot across the range of eff_num_parties
m5_pred_parties <- predict(m5,
                           interval="confidence",
                           newdata = data.frame(binary_growth=median(levin$binary_growth,na.rm=T),
                           binary_openness=median(levin$ binary_openness,na.rm=T),
                           int_support=0,
                           int_oppose=0,
                           eff_num_parties=seq(0,3.95,.1),
                           gdp_percapita= mean(levin$gdp_percapita, na.rm=T)))

plot(seq(0,3.95,.1), m5_pred_parties[,"fit"], type="l",
     xlim=c(0,4), ylim=c(-15,50), 
     xlab="Effective number of parties",
     ylab="Change in Incumbent Vote Share", main = "Non-linear effect of the number of parties \n on incumbent vote share")
lines(seq(0,3.95,.1), m5_pred_parties[,"lwr"])
lines(seq(0,3.95,.1), m5_pred_parties[,"upr"])

As expected, the function is convex: the expected value of \(Y\) is positive at first, then falls as the effective number of parties increases, and goes up again as we approach an effective number of parties of 3 and 4. The confidence intervals also grow wider as eff_num_parties increases, suggesting that there are fewer observations (and thus, more uncertainty) at larger values of the predictor.


Using GGplot and Sjplot to Plot Predicted Values

It is absolutely important to know how interaction terms are computed - and to be able to plot them manually. Nonetheless, we do not have to do this exercise every single time we are interested in an interaction effect. The plot_model function from the package sjPlot allows us to plot automatically predicted values from regression models, simply by specifying the variables you want to use with the argument terms, and setting the argument type to either pred (predicted) or int (interaction). sjPlot uses the same syntax as ggplot2, so you will need to pass additional arguments (e.g. specifying the axes labels, or changing the colours) by adding a + to the plot call, and using other functions from ggplot2. On the plus side, using sjPlot means that the function plot_model does most of the work for you - so you will have less to code than with base R.

#install packages
install.packages("sjPlot")
install.packages("ggplot2")

library(sjPlot)
library(ggplot2)
#recode dummies as charachter
#We do this so sjPlot knows to treat them as nominal
#(“Closed Economy”, “Open Economy”) rather than interval
#(0,1) variables.

levin$Growth <- NA
levin$Growth[levin$binary_growth == 1] <- "growth > 2%"
levin$Growth[levin$binary_growth == 0] <- "growth < 2%"

levin$Openness <- NA
levin$Openness[levin$binary_openness == 1] <- "Open Economy"
levin$Openness[levin$binary_openness == 0] <- "Closed Economy"

Plotting Predicted Values (Two Categorical Variables) without Interaction

m1 <- lm(delta_vote ~ Growth + Openness +
         int_support + int_oppose +
         eff_num_parties + gdp_percapita,
         data=levin)

plot_model(m1, type = "pred", terms = c("Growth", "Openness"),
           title = "Predicted Values of Delta Vote (No Interaction Model)") +
           xlab ("Growth") +
           ylab("Predicted change in incumbent vote share") +
           theme_minimal() #removes ugly grey background

Plotting Predicted Values - One Continuous Variable with Polynomial

m2 <- lm(delta_vote ~ Growth + Openness +
         int_support + int_oppose +
         eff_num_parties + I(eff_num_parties^2) + gdp_percapita,
         data=levin)

plot_model(m2, type = "pred", terms = "eff_num_parties",
           title = "Predicted Values of Delta Vote
                    By Effective Number of Parties") +
           xlab("Effective Number of Parties") +
           ylab("Predicted change in incumbent vote share") +
           theme_minimal() #removes ugly grey background

Plotting Predicted Values - Interaction between Two Categorical Variables

m3 <- lm(delta_vote ~ Growth*Openness +
         int_support + int_oppose +
         eff_num_parties + gdp_percapita,
         data=levin)

plot_model(m3, type = "int", terms = c("Growth", "Openness"),
           title = "Predicted Values of Delta Vote (Interaction Model)") +
           xlab("Growth") +
           ylab("Predicted change in incumbent vote share") +
           theme_minimal() #removes ugly grey background

Plotting Predicted Values - Interaction between Categorical and Continuous Variables

m4 <- lm(delta_vote ~ gdp_growth*Openness +
         as.factor(int_support) + as.factor(int_oppose) +
         eff_num_parties + gdp_percapita,
         data=levin)

plot_model(m4, type = "int", terms = c("gdp_growth", "Openness"),
           title = "Predicted Values of Delta Vote (Interaction Model)") +
           xlab("GDP Growth") +
           ylab("Predicted change in incumbent vote share") +               theme_minimal() #removes ugly grey background 

Plotting Predicted Values - Interaction between Two Continuous Variables

m5 <- lm(delta_vote ~ gdp_growth*trade_openness +
         as.factor(int_support) + as.factor(int_oppose) +
         eff_num_parties + gdp_percapita,
         data=levin)

plot_model(m5, type = "int", terms = c("gdp_growth", "trade_openness"), #if you want to select specific values, you can do this using the following syntax: e.g., "trade_openness [0,2]"
           title = "Predicted Values of Delta Vote (Interaction Model)") +
           xlab("GDP Growth") +
           ylab("Predicted change in incumbent vote share") +
           theme_minimal() #removes ugly grey background 

Other Packages

Another good R package for creating interaction plots is interplot, which uses the same syntax as ggplot. For those who have already used ggplot, or would like to in the future, both ggplot and interplot are good alternatives to the base-R plots we mainly have used in this course.